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Abstract 



i— i We consider the problem of estimating the proportion 9 of true null hypotheses in 

a multiple testing context. The setup is classically modeled through a semiparamet- 
ric mixture with two components: a uniform distribution on interval [0, f] with prior 

+J probability 9 and a nonparametric density /. We discuss asymptotic efficiency results 

and establish that two different cases occur whether / vanishes on a set with non null 
Lebesgue measure or not. In the first case, we exhibit estimators converging at para- 
metric rate, compute the optimal asymptotic variance and conjecture that no estimator 
is asymptotically efficient (i.e. attains the optimal asymptotic variance). In the second 
case, we prove that the quadratic risk of any estimator does not converge at parametric 
rate. We illustrate those results on simulated data. 
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1 Introduction 

X 

The problem of estimating the proportion of true null hypotheses is of interest in situ- 
ation where several thousands of (independent) hypotheses can be tested simultaneously. 
One of the typical applications in which multiple testing problems occur is estimating the 
proportion of genes that are not differentially expressed in deoxyribonucleic acid (DNA) 
microarray experiments (see for instance Dudoit and van der Laan, 2008). Among other ap- 
plication domains, we mention astrophysics (Meinshausen and Rice, 2006) or neuroimaging 
(Turkheimer et al., 2001). A reliable estimate of 6 is important when one wants to control 
multiple error rates, such as the false discovery rate (FDR) introduced by Benjamini and 
Hochberg (1995). In this work, we discuss asymptotic efficiency of estimators of the true 
proportion of null hypotheses. We stress that the asymptotic framework is particularly 
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relevant in the above mentioned contexts where the number of tested hypotheses is huge. 

In many recent articles (such as Broberg, 2005; Celisse and Robin, 2010; Efron, 2004; 
Efron et al., 2001; Genovese and Wasserman, 2004, etc), a two-component mixture density 
is used to model the behavior of p- values Xi,X2, ... ,X n associated with n independent 
tested hypotheses. More precisely, assume the test statistics are independent and identically 
distributed (iid) with a continuous distribution under the corresponding null hypotheses, 
then the p- values X\,X%,. . . ,X n are iid and follow the uniform distribution U([0, 1]) on 
interval [0, 1] under the null hypotheses. The density g of p- values is modeled by a two- 
component mixture with following expression 



where 9 £ [0, 1] is the unknown proportion of true null hypotheses and / denotes the density 
of p- values generated under the alternative (false null hypotheses). 

Many different identifiability conditions on the parameter (9, f) in model (1) have been 
discussed in the literature. For example, Genovese and Wasserman (2004) introduce the 
concept of purity that corresponds to the case where the essential infimum of / on [0, 1] 
is zero. They prove that purity implies identifiability but not vice versa. Langaas et al. 
(2005) suppose that / is decreasing with /(l) = while Neuvial (2010) assumes that / is 
regular near x = 1 with f(l) = and Celisse and Robin (2010) consider that / vanishes on 
a whole interval included in [0, 1]. These are sufficient but not necessary conditions on / 
that ensure identifiability. Now, if we assume more generally that / belongs to some set T 
of densities on [0, 1], then a necessary and sufficient condition for parameters identifiability 
is stated in the next result, whose proof is given in Section 5.1. 

Proposition 1. The parameter (9, f) is identifiable on a set (0, 1) x T if and only if for 
all f £ T and for all c £ (0, 1), we have c + (1 — c)f ^ T . 

This very general result is the starting point to considering explicit sets T of densities 
that ensure the parameter's identifiability on (0, 1) x T . In particular, if J 7 is a set of 
densities constrained to have essential infimum equal to zero, one recovers the purity result 
of Genovese and Wasserman (2004). However, from an estimation perspective, the purity 
assumption is very weak and it is hopeless to obtain a reliable estimate of 9 based on 
the value of / at a unique value (or at a finite number of values). In the following, we 
explore asymptotic efficiency results for the estimation of 9 and establish that two different 
cases are to be distinguished: models assuming that / vanishes on a set of points with 
positive Lebesgue measure and models where this set of points has zero measure (and where 
regularity or monotonicity assumptions are added on /). In the first case, we obtain the 
existence of -^/n-consistent estimators of 9 that is to say estimators 9 n such that y/n(9 n — 9) 
is bounded in probability (denoted by ^fn{9 n — 9 s ) = Op(l)). We exhibit such estimators and 
also compute the asymptotic optimal variance for this problem. Moreover, we conjecture 
that asymptotically efficient estimators (that is estimators asymptotically attaining this 
variance lower bound) do not exist. In the second case, while the existence of an estimator 
9 n of 9 converging at parametric rate has not been established yet, we prove that if such a 
\/n-consistent estimator of 9 exists, then the variance V&r(y/n9 n ) cannot have a finite limit. 
In other words, the quadratic risk of 9 n cannot converge to zero at a parametric rate. 




(1) 
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Let us now discuss the different estimators of 9 proposed in the literature, starting with 
those assuming (implicitly or not) that / attains its minimum value on a whole interval. 
First, Schweder and Spj0tvoll (1982) suggested a procedure to estimate 9, that has been 
later used by Storey (2002). This estimator depends on an unspecified parameter AG [0, 1) 
and is equal to the proportion of p- values larger than this threshold A divided by 1 — A. It is 
thus consistent only if / attains its minimum value on the interval [A, 1] (an assumption not 
made in the article by Schweder and Spj0tvoll (1982) nor the one by Storey (2002)). Note 
that even if such an assumption were made, it would not solve the problem of choosing A 
such that / attains its infimum on [A, 1]. Adapting this procedure in order to end up with 
an estimate of the positive FDR (pFDR), Storey (2002) proposes a bootstrap strategy to 
pick A. More precisely, his procedure minimizes the mean squared error for estimating the 
pFDR. Note that Genovese and Wasserman (2004) established that, for fixed value A such 
that the cumulative distribution function (cdf) F of f satisfies F(X) < 1, Storey's estimator 
converges at parametric rate and is asymptotically normal, but is also asymptotically biased: 
thus it does not converge to 9 at parametric rate. Some other choices of A are, for instance, 
based on break point estimation (Turkheimer et al., 2001) or spline smoothing (Storey and 
Tibshirani, 2003). Another natural class of procedures in this context is obtained by relying 
on a histogram estimator of g (Mosig et al., 2001; Nettleton et al., 2006). Among this kind 
of procedures, we mention the one proposed recently by Cclisse and Robin (2010) who 
proved convergence in probability of their estimator (to the true parameter value) under 
the assumption that / vanishes on an interval. Note that both Storey's and histogram 
based estimators of 9 are constructed using nonparametric estimates g of the density g and 
then estimate 9 relying on the value of g on a specific interval. The main issue with those 
procedures is to automatically select an interval where the true density g is identically equal 
to 9. As a conclusion on the existing results for this setup (/ vanishing on a set of points 
with non null Lebesgue measure), we stress the fact that none of these estimators were 
proven to be convergent to 9 at parametric rate. In Proposition 2 below, we prove that 
a very simple histogram based estimator possesses this property, while in Proposition 3, 
we establish that this is also true for the more elaborate procedure proposed by Cclisse 
and Robin (2010) which has the advantage of automatically selecting the "best" partition 
among a fixed collection. However, we are not aware of a procedure for estimating 9 that 
asymptotically attains the optimal variance in this context. Besides, one might conjecture 
that such a procedure does not exist for regular models (see Section 3.3). 

Other estimators of 9 are based on regularity or monotonicity assumptions made on / or 
equivalently on g, combined with the assumption that the infimum of g is attained at x = 1. 
These estimators rely on nonparametric estimates of g and appear to inherit nonparametric 
rates of convergence. Langaas et al. (2005) derive estimators based on nonparametric 
maximum likelihood estimation of the p- value density, in two setups: decreasing and convex 
decreasing densities /. We mention that no theoretical properties of these estimators are 
given. Hengartner and Stark (1995) propose a very general finite sample confidence envelope 
for a monotone density. Relying on this result and assuming moreover that the cdf G of g 
is concave and that g is Lipschitz in a neighborhood of x = 1, Genovese and Wasserman 
(2004) construct an estimator converging to g(l) = 9 at rate (log n) l l^n~ 1 ^ . Under some 
regularity assumptions on / near x = 1, Neuvial (2010) establishes that by letting A — > 1, 
Storey's estimator may be turned into a consistent estimator of 9, with a nonparametric rate 
of convergence equal to n^ k ^ 2k+1 ^r/ n , where rj n — > +oo and k controls the regularity of / 
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near x = 1. Our results are in accordance to the literature: no -y/n-consistent estimator has 
been constructed yet, as is expected from the fact that the quadratic risk of any estimator 
of 9 cannot converge at parametric rate in this case (see Corollary 1). 

To finish this tour on the literature about the estimation of 9, we mention that Mein- 
shausen and Biihlmann (2005) discuss probabilistic lower bounds for the proportion of true 
null hypotheses, which are valid under general and unknown dependence structures between 
the test statistics. Finally, note that we do not discuss here estimators of the proportion 
of non null effects in Gaussian mixtures such as in Cai and Jin (2010); Jin (2008); Jin and 
Cai (2007), a related but although different problem as the one we study. 

The article is organized as follows. Section 2 establishes lower bounds on the quadratic 
risk for the estimation of 9, while Section 3 explores corresponding upper bounds, i.e. 
the existence of -y/n-consistent estimators of 9 and the existence of asymptotically efficient 
estimators. Section 4 illustrates our results relying on simulations. The proofs of the main 
results are postponed to Section 5, while some technical lemmas are proved in Appendix A. 

2 Lower bounds for the quadratic risk and efficiency 

In this section, we give lower bounds for the quadratic risk of any estimator of 9. For any 
fixed unknown parameter 5 £ [0, 1), we introduce a set of densities Tg (with respect to the 
Lebesgue measure fi) and an induced set of semiparametric distributions Vs, respectively 
defined as 

J-$ = {/ : [0, 1] i—T- M + , continuously non increasing density, positive on [0, 1 — 5) 



Note that for any fixed value 5 £ [0, 1), the condition stated in Proposition 1 is satisfied on 
the set T$, namely forall f £ J~s and for all c G (0, 1), we have c+ (1 — c)f ^ T§. Thus, the 
parameter (9, f) is identifiable on (0, 1) x F$. 

The case 5 > corresponds to models where density / is supposed to vanish on a set 
of points with non null Lebesgue measure. This case is thus easier from an estimation 
perspective. Note that when 6 = 0, it is usual to add assumptions on /. Here, we choose 
to consider the case where / is assumed to be non increasing. The same results may be 
obtained by replacing this assumption with a regularity constraint on /. Note also that 
when 5 > 0, the assumption that / is non increasing could be removed without any change 
in the results. 

We aim at computing the (asymptotic) efficient information for estimating the finite 
dimensional parameter = 6 in model Vs where we consider / as a nuisance param- 

eter. We start by recalling some concepts from semiparametric theory and give explicit 
expressions of the objects arising from this theory in our specific framework. We follow 
the notation of Chapter 25 and more particularly Section 25.4 in van der Vaart (1998) and 
refer to this book for more details. 



and such that f\[is,i] = 0}, 



Vs 




9 + {l-9)f;(9,f) G (0,1) x T s }. 
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We fix a parameter value (6, f) and consider first a parametric submodel of J-$ induced 
by the following path 

' * ^ = f SwTi = c(t)fc(^(x))/(x), (2) 
J k{tho{u))j(u)du 

where /io is a continuous and non increasing function on [0, 1], the function k is defined by 
k(u) = 2(l+e~ 2u ) _1 and the normalising constant c(t) satisfies c(t) -1 = f k(tho(u))f(u)du. 
A tangent set fPg is composed of the score functions associated to such parametric sub- 
models (as ho varies). It is easy to see that the path (2) is differentiable and that its 
corresponding score function is obtained by differentiating t h-> log[0 + (1 — 9)ft{x)\ at 
t = 0. We thus obtain a tangent set for / given by 

fVs = |/i = q — — ; /io is continuous and non increasing on [0, 1—5) with J fho = Oj. 

Now, we consider parametric submodels of Vs induced by paths of the form t \-t Fg +ta j t 
where the paths 1 1— > ft in are given by (2). We remark that if Igj is the ordinary score 
function for 9 in the model in which / is fixed, then for every a£l and for every h S fVs, 
we have algj + h is a score function for (9, f) corresponding to the path t h-» Fg +ta j t . Hence, 
a tangent set Vs of the model 7^ at Fgj with respect to the parameter (9, f) is given by 
the linear span 

Vs = lm(i ej + f Vs) = {algj + f3h; (a, (3) £ M 2 , h G /A}. 
Moreover, the ordinary score function for in the model in which / is fixed is given by 

Now we let Igj be the efficient score function and Igj be the efficient information for 
estimating ifi(Fgj) = 9. These quantities are defined respectively as 



kj = kj ~ Tie, fie j and Igj = Fgj(l 



where Hgj is the orthogonal projection onto the closure of the linear span of fVs in L2(Pg /)■ 
The functional tp : Fgj i— > 9 is said to be differentiable at Pg j relative to the tangent set 
V5 if there exists a continuous linear map iftg j : L^Pgj) 1— > M, called the efficient influence 
function, such that for every path 1 1->- ft with score function h £ fV$, we have 



Va e 



y i>g tf (x)[aJi g ,f(x) + h(x)]dP e>f (x) 



Setting a = 0, we see that this efficient influence function must be orthogonal to the tangent 
set fVs- Finally, note that under some assumptions, the efficient influence function ipg t 
equals Igjlgj (see Lemma 25.25 in van der Vaart, 1998). The following theorem provides 
expressions for these quantities in our setup. All the proofs in the current section are 
postponed to Section 5.1. 
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Theorem 1. The efficient score function Igj and the efficient information Igj for esti- 
mating 9 in model V$ are given by 

11 S 

~W = g - j^res) 1 ^-^) ™d iej = ^resji W 

where 1a(") is the indicator function of set A. In particular, when 5 = 0, this efficient 
information is zero. In this case, the functional t/j(Pgj) = 9 is not differ entiable at ¥gj 
relative to the tangent set Vq. 

Moreover, when 5 > 0, the efficient influence function tpgj relative to the tangent set V5 is 
given by 

Tpej{x) = - 9. 

This theorem has some consequences on the quadratic risk of any estimator that we 
now explain. For every score function g in the tangent set Vs, we write Pt t9 for a path 
with score function g along which the functional ifj : Pgj 1 — 5- 6* is differentiable. Namely, Pt, g 
takes the form Fg+tajt f° r some path 1 1— >■ ft and some a£l. Now, an estimator sequence 
6 n is called regular at Pgj for estimating 9 (relative to the tangent set V5) if there exists a 
probability measure L such that for any score function g G Vs corresponding to a path of 
the form 1 1— > (9 + ta, ft), we have 



n(9 n - iP(P 1/V eJ) = Vn\9 n - (6 + -^=JJ A L, under Py^g, 

where A- denotes convergence in distribution. According to a convolution theorem (see 
Theorem 25.20 in van der Vaart, 1998), this limit distribution writes as the convolution 
between some unknown distribution and the centered Gaussian distribution N(0, Pgj(tpQ j)) 
with variance 

Ve,Mj) = J $,fdP 9J . 

Thus we say that an estimator sequence is asymptotically efficient at ¥gj (relative to the 
tangent set V5) if it is regular at Pgj with limit distribution L = N(0, Pqj^q y)), in other 

words it is the best regular estimator. The quadratic risk of an estimator sequence 9 n 
(relative to the tangent set T5), is defined as 

sup liminf sup Py^g [y/n(9 n - ^(Pi/^g))] 2 , 
E s n ~^°° g£E s 

where the first supremum is taken over all finite subsets E$ of the tangent set Vs- According 
to the local asymptotic minimax (LAM) theorem (see Theorem 25.21 in van der Vaart, 
1998), this quantity is lower bounded by the minimal variance Pgj(ipQ j). Thus, Theorem 1 
has the following corollary. 



d 



Corollary 1. When 5 = 0, any estimator sequence 9 n has an infinite quadratic risk, namely 



where the first supremum is taken over all finite subsets Eq of the tangent set Vq. 
When 5 > 0, we obtain that 
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i) For any estimator sequence 6 n we have, 

sup liminf sup E P [x/rl(9 n - i>(P 1/v ^J)] 2 > OU - 9), 

Eg rwoo geEg o 

where the first supremum is taken over all finite subsets E$ of the tangent set V5. 

ii) A sequence of estimators 9 n is asymptotically efficient in the sense of a convolution 
theorem (best regular estimator) if and only if it satisfies 

I n I 

= ~ E guslis,!] + °P e ,/(» -V2 )- (5) 
i=i 

Remark 1. A) When 5 = 0, using Theorem 2 in Chamberlain (1986), we conclude that 
there is no regular estimator for 9 relative to the tangent set Vq. This implies that if 
there exists a y/n- consistent estimator in model Vq, it can not have finite asymptotic vari- 
ance. In other words, we could have y/n{9 n — 9) = Op(l) for some estimator 9 n but then 
Var(y/n9 n ) — > +00. However, we note that the only rates of convergence obtained until now 
in this case are nonparametric ones. 

B) When5 > , for fixed parameter value A suchthatG(X) < 1, Storey 's estimator 9 storey (X) 
satisfies 

(t*~. w _ Izm.) _u n (0, gfcp) 

\ 1 — A J n->oo y (1 — \y J 

(see for instance Genovese and Wasserman, 2004). In particular, if we assume that f 
vanishes on [A, 1] then we obtain that G(X) = 1 — 9(1 — A) and 9 storey (X) becomes a \Jn- 
consistent estimator of 9, which is moreover asymptotically distributed, with asymptotic 
variance 

1 



n 



9 



1 - A 



In this sense, the oracle version of Storey's estimator that picks X = 1 — 5 (namely choosing 
X as the smallest value such that f vanishes on [A, 1]) is asymptotically efficient. Note also 
that 9 torey (X) automatically satisfies (5). 



3 Upper bounds for the quadratic risk and efficiency (when 

5 > 0) 

In this section, we investigate the existence of asymptotically efficient estimators for 9, in 
the case where 5 > 0. We consider histogram based estimators of 9 where a nonparametric 
histogram estimator g of g is combined with an interval selection that aims at picking an 
interval where g is equal to 9. We start by establishing the existence of -^/n-consistent 
estimators: a simple histogram based procedure is studied in Section 3.1 while a more 
elaborate one is the object of Section 3.2. Finally in Section 3.3 we explain the general one- 
step method to construct an asymptotically efficient estimator relying on a y/n- consistent 
procedure and discuss conditions under which an asymptotically efficient estimator could 
be obtained in model Vs. 
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3.1 An histogram based estimator 

Throughout this section and the following one, we assume that the density / belongs to 
L 2 ([0, 1]). Let gi be a histogram estimator corresponding to a partition I = (Ik)i,...,D of 
[0,1], defined by 

D 

M*) = E^||l4(z), 

where n/% = card{z : Xi G Ik} is the number of observations in Ik and \Ik\ is the width of 
interval Ik- We estimate 9 by the minimal value of gj, that is 

9 In = min J£- = -^-, (6) 
J ' n i<k<Dn\I k \ n|IJ' V ; 

where we let 

k n G Argmin < = —— V lx<e/* f • 

i<fc<D [n\Ik\ n\Ik\ ^ J 

Note that histogram estimators are natural nonparametric estimators for g when assum- 
ing that / G T$ with 5 > 0, that is g is constant on an interval. It is easy to see that #/ jn 
is y^-consistent as soon as the partition I is fine enough. We moreover establish that this 
estimator has a variance of the order 1/n. The proof of this result appears in section 5.2. 

Proposition 2. Fix 5 > and suppose that f G J-g- Assume moreover that the partition I 
is such that max^ \ Ik\ is small enough, then the estimator 9i }Tl has the following properties 

i) 6i,n converges almost surely to 6, 

ii) &i,n is y/n- consistent, i.e. y/n(0i : n — 6) = Op(l), 
Hi) lim supV ar(\/w/,n) < +00. 

n— >oo 

Note that while -y/n-consistency and a control of the variance of y/n6i >n are proved in 
the above proposition, asymptotic normality of 6i >n or the value of its asymptotic variance 
are difficult to obtain. Indeed, for any deterministic interval 1^, the central limit theorem 
(CLT) applies on the estimator nk/(n\Ik\)- However, an histogram based estimator such as 
Qi n is based on the selection of a random interval / and the CLT fails to apply directly on 
Uj/{n\I\). Note also that the choice of the partition / is not solved here. From a practical 
point of view, decreasing the parameter maxj, \ Ik\ will in fact increase the variance of the 
estimator. In the next section, we study a procedure that automatically selects the best 
partition among a given collection. 



3.2 Celisse and Robin (2010) 's procedure 

We recall here the procedure for estimating 6 that is presented in Celisse and Robin (2010). 
It relies on an elaborate histogram approach that selects the best partition among a given 
collection. As it will be seen from the simulations experiments (Section 4), its asymptotic 
variance is likely to be smaller than for the previous estimator, justifying our interest into 
this procedure. Unfortunately, from a theoretical point of view, we only establish that this 
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estimator should be as good as the previous one. Note that since not many estimators of 9 
have been proved to be -y^-- conver g en t, this is already a non trivial result. 

For a given integer M, define Im as the set of partitions of [0, 1] such that for some 
integers k, I with 2 < k + 2 < I < M, the first k intervals and the last M — I ones are regular 
of width 1/M, namely 

X M = {I= (J 4 )i : V* + k + 1, \h\ = \I k+1 \ = 2 < k + 2 < I < M}. 

These partitions are motivated by the assumption that / vanishes on a set [A,//] C [0, 1]. 
Then for two given integers m m i n < m max , denote by X the following collection of partitions 

X = (J X 2 m. (7) 

Every partition / in X is characterized by a triplet (M = 2 m , A = k/M, ^ = l/M) and the 
quality of the histogram estimator gj is measured by its quadratic risk. So in this sense, 
the oracle estimator gj* is obtained through 

I* = argmin E[| \g — gj\ ||] = argmin R(I), where R(I) = IE Hp/Hi - 2 / gj(x)g(x)dx 
lex lex L J 

However, for every partition /, the quantity R(I) depends on g which is unknown. Thus 
I* is an oracle and not an estimator. It is then natural to replace R(I) by an estimator. In 
Celisse and Robin (2008, 2010), the authors use leave-p-out (LPO) estimator of R(I) with 
p G {1, . . . , n — 1}, whose expression is given by (see Celisse and Robin, 2008, Theorem 2.1) 

6 m = 2n-p \- n k n{n-p + l) \ - 1 ,n k , 2 () 
A ' (n - l)(n - p) ^ n\I k \ (n - l)(n - p) ^ \I k \ 1 n > ' {) 

The best theoretical value of p is the one that minimizes the mean squared error (MSE) of 
R P (I), namely 

p*(I)= argmin MSE(p,I)= argmin E (R P (I) - R(I)) 2 

pe{l,...,n-l} pe{l,...,n-l} L 

It clearly appears that MSE(p, I) has the form of a function <fr(p,I,a) (see Celisse and 
Robin, 2008, Proposition 2.1) depending on the unknown vector a = (a±, a 2 , ■ ■ ■ , «_d) with 
a k = P(Xi G I k ). A natural idea is then to replace the a k s in <&(p, I, a) by their empirical 
counterparts a k = n k /n and an estimator of p*(I) is therefore given by 

p(I) = argmin MSE(p,I)= argmin &(p, I, a). 

pe{l,...,n-l} pe{l,...,n-l} 

The exact calculation of p{I) may be found in Theorem 3.1 from Celisse and Robin (2008). 
Hence, the procedure for estimating 9 is the following one 

1. For each partition / 6 X, define p(I) = argmin MSE(p, I), 

pe{l,...,n-l} 

2. Choose / = (M,X,fi) G argmin Rp/j\(I) such that the width of the interval [A, fi] is 

lex 

maximum, 
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3. Estimate 9 by 0% R = card{i : G [A, A] }/[«(/* ~ ^)]- 

Remark 2. 7n our procedure, we consider the set of natural partitions defined by (7), while 
Celisse and Robin (2010) use the one defined by 

M min <M<Mmax 

This change is natural for lowering the complexity of the algorithm and has no consequences 
on the theoretical properties of the estimator. In particular, if we assume the function f 
vanishes on an interval [1 — 5,1], then the complexity of the algorithm is simpler when we 
consider the following set of partitions 

X = Z 2 m j 

where 

T M = {I (k) = (/i)<=i,...,fc+i : V* < k, = |7 fe+ i| = 1 < k < M - 2}. 

In Celisse and Robin (2010), the authors only establish convergence in probability of 
this estimator. Here, we prove its almost sure convergence, y^-consistency and establish 
that its variance is of the order \jn. Let us first introduce some assumptions. 

Assumption 1. Density f is null on an interval [A*,//*] C (0, 1] (with unknown values A* 
and fj,*) and f is monotone outside the interval [A*,//*]. 

For example, / is decreasing on [0, A*] and increasing on [/z*,l]. This assumption is 
stronger than Assumption A' in Celisse and Robin (2010), the latter not being sufficient to 
establish the result they claim (see the proof of Lemma 3 for more details). The monotonic- 
ity part of our assumption is not necessary and we shall explain what is exactly required 
and how we use the previous assumption in the proof of Lemma 3. Under Assumption 1, 
the true parameter 6 is equal to g(x) for all x in [A*,/z*]. Note that the case where we 
impose fj,* = 1 is included in this setting. We now introduce a technical condition that 
comes from Celisse and Robin (2010). We let 

V(z,j)GN 2 , ^ = E^ 

and further assume that the collection of partitions I and density / are such that 

VI G X, 8siiS2i - 2s?! + 8s 32 - I0s 2 21 - 4s 22 + 0, s 2 i - s 22 - s 32 + 3s u ^ 0. (9) 

This technical condition is used in Celisse and Robin (2010) to control the behaviour of 
the minimizer p(I). We are now ready to state our result, whose proof can be found in 
Section 5.3. 

Proposition 3. Suppose that f satisfies Assumption 1 as well as the technical condition (9). 
Assume moreover that m max is large enough, then the estimator 6^ R has the following 
properties 
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i) 8~[ converges almost surely to 9, 

ii) 6^ R is yfn-consistent, i.e. y/n{9^ R — 0) = Op(l)> 

Hi) If p is fixed then limsupYar(T/n6n R ) < +oo. 

n— x>o 

Here again, asymptotic normality of 9^ or the exact value of its asymptotic variance 
are difficult to obtain. Heuristically, one can explain that this procedure outperforms the 
simpler histogram based with fixed partition approach described in the previous section. 
Indeed, when considering a fixed partition, the latter should be fine enough to obtain 
convergence but refining the partition increases the variance of 9j tU . Here, Celisse and 
Robin's approach realizes a compromise on the size of the partition that is used. 



3.3 One-step estimators 

In this section, we introduce the one-step method to construct an asymptotically efficient 
estimator, relying on a -y/n-consistent one (see van der Vaart, 1998, Section 25.8). Let 
9 n be a i/n-consistent estimator of 9, then 9 n can be discretized on grids of mesh width 
n -1 / 2 . Suppose that we are given a sequence of estimators l n ,e(-) = ln,e{-', X%, . . . , X n ) of 
the efficient score function Igj. Define with m = [n/2\ , 



l m ,e{-;Xi, . . . ,X m ) if % > m, 

ln-m,o(-;X m+ i, . . . ,X n ) if i < m. 



Thus, for Xi ranging through each of the two halves of the sample, we use an estimator l n q j 
based on the other half of the sample. We assume that, for every deterministic sequence 
9 n = 8 + 0(n -1 / 2 ), we have 

SnFe n Jn,e n 0, (10) 

n— >oo 

^ n j\\Le n -le nJ \\ 2 (11) 

Note that in the above notation, the term ^9 n jl for some random function I is an abbre- 
viation for the integral J l{x)dFQ n j{x). Thus the expectation is taken with respect to x 
only and not the random variables in /. Now under the above assumptions, the one-step 
estimator defined as 



i=l i=l 



is asymptotically efficient at (9,f) (see van der Vaart, 1998, Section 25.8). This estimator 
9 n can be considered a one-step iteration of the Newton- Raphson algorithm for solving an 
approximation of the equation Yli^d,f(Xi) = with respect to 9, starting at the initial 
guess 9 n . 
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Now, we discuss a converse result on necessary conditions for existence of an asymptot- 
ically efficient estimator of 9 and its implications in model Vs- 

Under condition (12), it is shown in Theorem 7.4 from van der Vaart (2002) that the 
existence of an asymptotically efficient sequence of estimators of 9 implies the existence of 
a sequence of estimators l U) g of Igj satisfying (10) and (11). In our case, it is not difficult 
to prove that condition (12) holds. Then, the estimator l n> g of the efficient score function 
Igj must satisfy both a "no-bias" (10) and a consistency (11) condition. The consistency is 
usually easy to arrange, but the "no-bias" condition requires a convergence to zero of the 
bias at a rate faster than 1 / yjn. We thus obtain the following proposition, whose proof can 
be found in Section 5.2. 

Proposition 4. The existence of an asymptotically efficient sequence of estimators of 9 
in model Vs is equivalent to the existence of a sequence of estimators l n ,g of the efficient 
score function lg j satisfying (10) and (11). Moreover, if the efficient score function Igj is 
estimated through a plug-in method that relies on an estimate 5 n of the parameter 5, then 
this condition is equivalent to T/n(5 n — 5) = op(l). 

Let us now explain the consequences of this result. The proposition states that efficient 
estimators of 9 exist if and only if estimators of Igj that satisfy (10) and (11) can be 
constructed. As there is no general method to estimate an efficient score function, such 
an estimator should rely on the specific expression (4). Though we cannot claim that all 
estimators of Igj are plug-in estimates based on an estimator of the parameter 5 plugged 
into expression (4), it is likely to be the case. Then, existence of efficient estimators of 9 is 
equivalent to existence of estimators of 5 that converge at faster than parametric rate. Note 
that this is possible for irregular models (see Chapter 6 in Ibragimov and Has'minskh, 1981, 
for more details). However, for regular models, such estimators cannot be constructed and 
one might conjecture that efficient estimators of 9 do not exist in regular models. 

4 Simulations 

In this section, we give some illustrations of the previous results on some simulated ex- 
periments and explore the non asymptotic performances of the estimators of 9 previously 
discussed. We choose to compare three different estimators: the histogram based estima- 
tor 9j^ n defined in Section 3.1 through (6), the more elaborate histogram based estimator 
§n proposed in Celisse and Robin (2010) and finally Langaas et al. (2005) 's estimator, 
denoted by 9% and defined as the value g(X^) where Xr n \ is the largest p- value and g is 
Grenander's estimator of a decreasing density. We investigate the behaviour of these three 
different estimators of 9 under two different setups: 5 = and 5 £ (0, 1). More precisely, 
we consider the alternative density / given by 

where 5 S [0, 1) and s > 1. This form of density is introduced in Celisse and Robin (2010) 
and covers various situations when varying its parameters. Note that / is always decreasing, 
convex when s > 2 and concave when s £ (1,2]. In the experiments, we consider a total of 
8 different models corresponding to different parameter values. These models are labeled 
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as described in Table 1, distinguishing the cases 5 = and 5 > 0. As an illustration, we 
represent some of the densities obtained for the p-values corresponding to 4 out of the 8 
models in Figure 1. For each estimator 9 n of 9, we compare the quantity nK[(9 n — 8) 2 } 
with the optimal variance 9(5~ 1 — 9) when this bound exists. Equivalently, we compare 
the logarithm of mean squared error, log(MSE) = logE[(#„ — 9) 2 ] for each estimator 9 n 
with — log(n) + log[#(<5 _1 — 9)]. When 5 = 0, we only compare the slope of the line 
induced by log(MSE) with the parametric rate corresponding to a slope —1. In each case, 
we simulated data with sample size n G {5000; 7000; 9000; 10000; 12000; 14000; 15000} and 
perform R = 100 repetitions. 

When computing the estimator 9j tU , the choice of the partition / surely affects the 
results. Here, we have chosen a regular partition I such that it is fine enough (we fixed 
\Ik\ < S) but not too fine (choosing a too small value of \Ik\ increases the variance). The 
choice of the partition in the simple procedure 9j tTl is an issue for real data problems. Our 
goal here is to show that on simulated experiments, the "best" of these estimators still has 
a larger variance than 6^ R . Note that the partition / is always included in the collection X 
of partitions from which 0? R is computed. 
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Figure 1: Density function of the p-values. Top left: model (61); top right: model {d\); 
bottom left: model (02); bottom right: model (02). 



The results are presented in Figure 2 for the case 5 > and Figure 3 for the case 5 = 0. 
First, we note that in both cases (5 > and 5 = 0), Langaas et al.'s estimator 9^ has 
nonparametric rate of convergence (null slope) and performs badly compared to 9i^ n and 

. In particular, when 5 = the two histogram based procedures 9j tJl and 9^ R have 
better performances than the estimator 6% despite the fact that the latter is dedicated to 
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M) 


5 = 0.3 


5 = 


(3,0.6) 


(ai) 


(«2) 


(3,0.8) 






(1.4,0.7) 


(ci) 


(<*) 


(1.4,0.9) 


(di) 


(A) 



Table 1: Labels of the 8 models with different parameter values. 

the convex decreasing setup. Now, when 5 > 0, both estimators 9j :n and 9^ exhibit a 
parametric rate of convergence (slope equal to —1). Moreover, 9% nas a smaller variance 
than 6i tTl (smaller intercept) and this variance is very close to the optimal one 6(5~ 1 — 9). 
Now, when 5 = 0, we observe two different behaviors depending on whether / is convex or 
not. Indeed, for models (02) and (62) corresponding to the convex case, we observe that 
both estimators 9j^ n and 9^ still exhibit a parametric rate of convergence, with a smaller 
variance for 9^ R . These estimators are thus robust to the assumption that / vanishes on 
an interval in the convex setup. The results are slightly different when considering models 
(02) and (cfo) where / is now concave. These estimators have a more erratic behaviour, 
exhibiting either parametric rate of convergence (9^ R in model (02) and 9i >n in model (^2)) 
or nonparametric rates. Their respective performances in terms of variance are also less 
clear. Nonetheless we conclude that 9^ R seems to exhibit the overall best performances, 
with parametric rate of convergence and almost optimal asymptotic variance. 

5 Proofs 

5.1 Proofs from Introduction and Section 2 

Proof of Proposition 1. Sufficiency: Let us suppose that for all / E J- and for all c G (0, 1), 
we have c + (1 — c)f ^ T . We prove that the parameters 9 and / are identifiable on 
the set (0,1) x F by contradiction. Suppose that there exist (0i,/i) and (6*2, / 2 ) £ J 7 , 
(01, /i)^ (0 2 , / 2 ) such that 

01 + (1 - 0i)/i(x) = 02 + (1 - 2 )/ 2 (x), for all x G [0, 1]. (13) 

We can always consider 9\ > 02. Let us denote by c = (0i — 02)/(l — 2 ), then c 6 (0, 1). 
We obtain that 

01 + (1 - 1 )f 1 (x) = 2 + (1 - 2 )(c + (1 - c)/i(x)), for all x G [0, 1]. (14) 

From (13) and (14), we have f% = c + (1 — c)/i, it means that there exist f\ G T and 
c G (0, 1) such that c + (1 — c)/i G T . So we have a contradiction. 

Necessity: Suppose that the parameters and / are identifiable on the set (0, 1) x T . We 
prove by contradiction that for all / G T and for all c G (0, 1), we have c + (1 — c)f ^ J 7 . 
Indeed, suppose that there exist / G T and c G (0, 1) such that c + (1 — c)/ G J 7 . For all 
01 G (0, 1), we denote 02 = c + (1 — c)0i, then we obtain 

0! + (1 - 0i)(c + (1 - c)f(x)) = 2 + (1 - 9 2 )f(x), for all x G [0, 1]. 



This implies that and / are not identifiable on the set (0, 1) x T . 



□ 
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7.5 8.0 8.5 9.0 9.5 7.5 8.0 8.5 9.0 9.5 

log(n) log(n) 



Figure 2: Logarithm of the mean squared error as a function of log(n) and corresponding 
linear regression for 6% (o and black line, respectively), 6^ (C and blue line, respectively) 
and 9j tn (• and green line, respectively) in the case 5 = 0.3, for different parameter values: 
(ai) top left; (bi) top right; (c±) bottom left; (di) bottom right. Red line represents the 
line y = - log(n) + log[6»((5 _1 - 9)]. 
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7.5 8.0 8.5 9.0 9.5 7.5 8.0 8.5 9.0 9.5 

log(n) log(n) 



Figure 3: Logarithm of the mean squared error as a function of log(n) and corresponding 
linear regression for 6% (o and black line, respectively), 6^ (C and blue line, respectively) 
and 9i >n (• and green line, respectively) in the case (5 = 0, for different parameter values: 
(02) top left; (62) top right; (02) bottom left; (^2) bottom right. Red line represents the 
line y = — log(n) + c for some well chosen constant c. 
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Proof of Theorem 1. According to the expression (3) of the ordinary score, we can write 

kf(-) = { e+ \ll e)f + 13^) %!-<)(*) + Ws^x) ~ j^lp,!-*)^)- (15) 

Let us recall that Hej is the orthogonal projection onto the closure of the linear span of 
fPs i n ^(Fg,/)- We prove that the orthogonal projection of Igj onto this space is equal to 
the first term appearing in the right-hand side of (15), namely 

and then the efficient score function for 6 is 

1 6 
kj{x) = i e ,f(x) - U g J j(x) = - _ e6 l[o,i-8)(x)- 

In fact, we can write 

l-f § \ n (1 ~ 0)/ho 



+ {l-9)f ' 1-95 J [0,1 ~ 5) + (l-0)/' 
where 

, . . / <5 + (l-0)/(a:)\ 



The function /io is continuous and decreasing on [0, 1 — 5). It is not difficult to examine the 
condition J fho = 0. Indeed, 

1 j ,1-5 
f(x)h {x)dx = — — / [(1 - 5)f{x) - l]dx 



i) JO 

1 



(1-6)(1- 

Hence 

l-f 5 



(l-5)f(x)dx-(l-5) 



0. 



+ — )l[ 0) i_5) belongs to lin^T^). 



+ (1-0)/ 1-05, 

Now, to conclude the proof of (16), it is necessary to establish that the second term in the 
right hand side of (15) is orthogonal to the closure of the linear span of fVs, namely 

1 5 1 



where _L means orthogonality in L, 2 (Fqj). In fact, for every score function 

(1-0) fho 
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the scalar product between h and the remaining term in (15) is given by 
f\ 1 1 5 , (l-9)f(x)h (x) lQ 

= I ^eo^m 10 ' 1 ^ " T=es ] e + (i-8)f(x) [e + (1 - e)f{x)]dx 

= 9(Y^95) J ^ x ^ h °^ 1 [o,i-S)( x ) dx ~ 1 _ es J o f(x)h (x)dx 
= 0. 

This establishes (16). Let us now calculate the efficient information 

hj =^ej(lgj) 

= J (^[i-^H^) + ^^ d5)2 ho,i-6)(x)) [6 + (l- 9)f(x)]dx 
5 



0(1 - 95) 



We now turn to the particular case where 5 = 0. In this case the previous computations 
show that Igj belongs to the closure of the linear span of fV$ and that the Fisher information 
is zero. Now, we show that the functional ip(Fgj) = 9 is not differentiable at Fgj relative 
to the tangent set Vq = \in{lgj + fVo) = f'Po- I n f &c t> if this were true, there would exist 
a function ipgj such that 



d ~ 
a = ^M^e+taJ t )\ = (4>gj,al ej + h), VaGE,/(G f V , 
&t \t=o 



where (•,•) denotes scalar product in L, 2 (Fgj). Choosing h = —Igj £ fVo, we obtain 
a = (a — l)(ipgj,lQj) for every value a £ M, which is impossible. 

For the rest of the proof, we set 5 > 0. Using Lemma 25.25 in van der Vaart (1998), we 
remark that the functional ip(Fgj) = 9 is differentiable at Fgj relative to the tangent set 
V$ with efficient influence function given by 

$6,f(x) =Igjh,f( x ) 

9(1-95) (1 5 x 

1-95 . . n 

= — ~ b — i[i-5,i]( x ) - Oifci-QW 

= ^ 1 [l-5,l]( x ) - ^> 

which concludes the proof. □ 
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Proof of Corollary 1. We start by dealing with the case 5 = 0. Let us recall that in this 
case, the ordinary score Igj belongs to fVo- We first remark that this tangent set fVo is a 
linear subspace of h 2 (Fgj) with infinite dimension. So we can choose an orthonormal basis 
{hi} c ^ =1 of fVo such that for every m, we have Igj ^ /"Po,m := hn(/ii, /12, • • • , h m ). We thus 
have 



E n ^°° g eE 

> sup liminf sup Ep 1/y _ g [Vn~{6 n - HP 1/V n, g ))] \ 
Fo 9&F0 

where Eq and Fo range through all finite subsets of the tangent sets Vo = lin(lgj + fVo) = 
fVo and ]in( Igj + /"Po,m) = /^o.mj respectively. The efficient score function for 6 corre- 
sponding to the tangent set fPo, m is 

m 

h,f,m = U,f — y^(4,/) 0- 
i=l 

Moreover, the efficient information Io,f,m = ^ejilg f m ) is non zero. Using Lemma 25.25 
from van der Vaart (1998), we remark that the functional ^(Fgj) = 8 is differentiable at 
Fgj relative to the tangent set lin(lgj + fVo : m) with efficient influence function ipgj : m = 
Iqj m le,f,m- So we can apply Theorem 25.21 from van der Vaart (1998) to obtain that 

S ^ P SUP . Ep i/v^, 9 W^i°n ~ VKA/v/^))] 2 > 

Since Ig r m > Ig t = 0, we obtain the result. The second part of the proof concerning 

' m->oo 

6 > is an immediate consequence of Theorem 1 together with Theorem 25.21 and Lemma 
25.23 in van der Vaart (1998). □ 

5.2 Proofs from Sections 3.1 and 3.3 

Proof of Proposition 2. Let us denote by T> = {1, 2, • • • , D}, Vq = {k £ V such that I k C 
[1 - 5, 1]} and V x = V \ V = {k £ V such that I k £[1-5,1]}. We start by proving that 
the estimator 6j tn converges almost surely to 9. Indeed, we can write that 

9i, n = e+Y. (^ri " e ) 1{kn = k} + {§1 > n ~ 9)1{I ^ £ I 1 " ^ ( 17 ) 

where 1{A} or 1a is used to denote the indicator function of set A. By using the strong 
law of large numbers, we have the almost sure convergences 

n \lk\ n ^+°° 

Vfc e V,, ^- ^ = J- / 9 (u)dn > 0. 

n|i fc | |i fe | |4| 7, 
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As a consequence, we obtain that the second term in the right-hand side of (17) converges 
almost surely to zero, namely 



kev 1 K| kdV Q 



n k _ 

k\ 



n— >+oo 



The third term in the right-hand side of (17) also converges almost surely to zero. Indeed, 
we have 

\§ Ijn -e\i{i kn ^[i-8,i)}< ( Yl 1 &n = k}. 

1 fe| keV! 

We fix an integer ko £ T>q, then for all k £ D±, we have 



H 

hr, > 



ik\ n\ij\ 



n\h\ n\I ko \ 
n\Iko\ 141 n\h 



\ hi ' 



Since = a k /\I k \ — > and 



n\I ko \ |Jfc| n|4| n^+oo 

we obtain that 



Lnii.„ id nil J n->+oo 



'feo 

which concludes the proof of the almost sure convergence of 9i <n . 
We now write 

n k 

Vn\- 
\i 

k£V 



The second term in the right hand-side of the previous equation converges in probability 
to zero. Indeed, for any e > 0, we have 



F(Vn\e I>n -e\l{I kn ^[l-S,l}}>e) < F(I %n <£ [1-6,1]) 

n k a , a k n k 

FT ~° + TFT iF 

'fcol l J fcl n \ 1 k 



< 



S P (TrT ~ 9 + ir\~ TFT - efc ) ► °" 

/ 

Now, whenever k E Do? by denoting 

the central limit theorem gives the convergence in distribution 

n\h\ ' n-^+oo \ V 



Efficient estimation of true null proportion 



21 



As a consequence, each of these terms is bounded in probability. According to (18), we 
conclude 

Vn0 I>n -9) = Op(l). 
We now prove the third statement of the proposition. We have 

Var( v / n0/,n) < E[(Vn(§ Ijn - 9)f], 

where 



E 



K vife - em - e *[K^ - »)) h„4 + e »[K&r - »)) u 



fee©! 



The second term in the right-hand side of (19) is bounded by 



(19) 



n k 
n\h\ 



hn — fa 



1 N2 

< ( max — - - 
i<k<D \I k \ 



n¥(k n = k), 



where for all k G T>\, fixing an integer ko G T>q and according to Hoeffding's inequality, 
k) < F(^< Hk ° 



n\I k \ n\I ko 
n 1 1 

^ p [E(rn 1 ^ G ^o}-^+^-nq l {^e/ fc }) > 



ne fc 



< exp [- 2ne|(— + ' 



\h\ I 



ko 



Thus, we get that 



n k 
n\Ik\ 



< ( max — *— 

i<a;<d 141 



nexp 

k£T>! 



2nei 



1 1 

+ 



> 0. 



fcv l4| ' |4 

For the first term in the right-hand side of (19), we apply Cauchy-Schwarz's inequality 



fce^o 



r? 



n\h\ 



E = k ) 

kev 



< 



n|4| 



(20) 



where for all fc G £>o 

E 

= ±E 
n 



E 



- n k 
•n|4| 



1 \ 4 

— l{Xi G 4} - 6 
4 



+ 



n — 1. 



-E^ 



1 40 60 2 _ 3 . 
nV|4| 3 |4| 2 + |4| 



n 

n — 1 
n 



4- 



(21) 
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Thus, we finally obtain that 



Var(Vr^/ in ) < 

( 



N 



^ in(\h 



49 69 2 
+ 



kev 



nV|/ fc |3 |4|2 |4| 



max - — r — o\ n exp 
i<k<D \I k \ J ^— ' 



3^ 3 + 



n — 1 



+ 



1 



1 



2nE Hua + ,/,i 



n— >+oo 



□ 



Proof of Proposition 4- Let us first establish that condition (12) holds. In fact, with the 
notation pgj = 9 + (1 — 9)f, we have 



< 



1,1/2 



»V2||2 



k n j(x)\ Pe n j{x) - kj(x)Jpe,f(x)) dx 



9n,f air e n j ~ l e,f ar o,f 



^ 2 { l e„,f(x)-l e j(x)) pg nJ (x)dx + 2 l ef (x)(Jpe n j(x)-Jpe,f(x))dx 



< 2 



1 1 



+ 



1 



I9 n 9 \9(l-95) 9 n (l-9 n 5) 



L {/(z)>0} 



po n j(x)dx 



+2 



i r 



1 



1 



o 19 9(1 -95) 



L {/(*)>0} 



VPdn,f(x) + y/p$A x ) 



;dx 



< 2 1 (9 n - ey 
■i 



+ 



£(g + £„) + ! 

(1-M)(1-M) {/W>0} 



Pe nJ (x)dx 



+2 



0) 2 2 



1 



2 2 (1 



C C(l + 2C0 



+ 



Q2 02(!_0)2 



(W(x)) 2 



L03 03(1 



n' 



where C is some positive constant. Thus, according to Theorem 7.4 from van der Vaart 
(2002), the existence of an asymptotically efficient sequence of estimators of 9 is equivalent 
to the existence of a sequence of estimators l n ^ satisfying (10) and (11). 

Now in model Vs, the efficient score function Igj is given by 

so that it is natural to estimate the parameter 5 in order to estimate Igj. Let S n be any 
given consistent (in probability) estimator of 5. Let us examine condition (10) more closely. 
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We have 



VnP 9n Jn,e n =Vr$e n j{in,e n - k n ,f) 



1 



On 
n 



1 



x u n u n 

1 1 



(x) 



1 



+ 



Vn LV 1 
1 



l[o,i-5)( x ) 9e n j{x)dx 



i-e n s\ [° 

re(<5 n - J) 



i-5 n )( x ) ~ 1 [0,1-5)0*0) 9e n ,f(x)dx 



9e n j{x) 



W)(i-M„) 



-fix + v^J- 



1-5. 



1-5 



n 9e n j{x) 

- o n s 



dx 



n(5 n - 5) 



1-5 



(1 



9ej{x) , - <5) . 

ax —~ hop(l) 



Hence, the "no-bias" condition (10) is equivalent to the existence of an estimator 5 n of 
5 that converges at a rate faster than 1/y/n, namely such that y/n(5 n — 5) = op(l). With 
the same argument as in the previous calculation, the consistency condition (11) is satisfied 
as soon as the estimator 5 n converges in probability to 5. □ 



5.3 Proof of Proposition 3 

For each partition /, let us denote by Ti the vector space of piecewise constant functions 
built from the partition / and gi the orthogonal projection of g £ L 2 ([0, 1]) onto Ti. The 
mean squared error of a histogram estimator gj can be written as the sum of a bias term 
and a variance term 

n\\9-9i\\l] = \\g-gi\\l + n\\9i-gi\\l]- 

We introduce three lemmas that are needed to prove Proposition 3. The proofs of these 
technical lemmas is further postponed to Appendix A. 

Lemma 1. Let I = (// c )£ =1 be an arbitrary partition of [0, 1]. Then the variance term of 
the mean squared error of a histogram estimator gj is bounded by C/n, where C is a positive 
constant. In other words, 

n\gi-9i\\l] = o{^). 

For any partition / = (Ik)i,...,D of [0, 1], we let 

L{I) = \\ gi - g\\l and L p (I) = R P {I) + \\g\\l 

respectively the bias term of the mean squared error of a histogram estimator gj and its 
estimator. 

Lemma 2. Let I = {Ik)i...,D be an arbitrary partition of [0, 1]. Let p £ {1,2, . . . ,n — 1} 
such that lim p/n < 1. Then we have the following results 

n— >oo 
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i) L P (I) L(I) 

ii) Vn{L p {I) - L(J)) = Vn(Kp{I) ~ + ^(^ll - s 2 i) ^> Af(0, 4a]), whe 

o\ = s 3 2 - s\ x with Sij = |TlJ' V (^i) G n2 - 



fe 1 

Let /, J be two partitions in X, then I is called a subdivision of J and we denote I < J, 
if Xj C J 7 / and I ^ J otherwise. 

Lemma 3. Suppose that function f satisfies Assumption 1. Let us consider m max large 
enough such that fi* - X* > 2 1 ~" w < : . Define N = 2 m ™* and iW = (N, X N ,fi N ) 6 X with 
Xn = \NX*~\/N, my = [Nfj,*\/N. Then for every partition I 6 X, we have 

i) If I is a subdivision of I^ N \ then L(I) = L(I^). 

ii) If I is not a subdivision of I^ N \ then L(I) > 1/(1^). 

We are now ready to prove Proposition 3, starting by establishing point i). First, we 
remark that under condition (9), Celisse and Robin prove in their Proposition 2.1 that 

^^U/)e[o,i). 

n n— >oo 

Denoting by A* = [A*,//*] and A = [A, p], we may write 
9 n R = 9 + (9% R - 9)lj <I(N) + (0 n R - 9)lj^ I(N) 



+ E 



1 " 

— -^l{X iG [A, M ]}-0 



1{A = X,p, = fj} 



n(/i - A) . | 

+ (®n R - ^) 1 /^/(iV), (22) 

where A" = 2 mmax as in Lemma 3. For each partition / = (AT, A,//) < we have 

[A, fj] Q A*. By applying the strong law of large numbers we get that 

1 V 1{ X, e[ A, fl ]}^ P ( A - £ [ A -"l)- 



n(yU — A) ^— ' ' n->oo /J, — X 

Since the cardinality cardiX) of X is finite and does not depend on n, in order to finish the 
proof, it is sufficient to establish that 

(0 n - 6)lj£ l(N) > 0. 



Using Lemma 3, we have L(I) > L(I^). Let 



7= minL(/)-L(/( JV ))>0, (23) 



we obtain that 



\0Z R - 0\1^ IW <(N- 9)1{L(I) - L(JW) > 7 } < 
(N - 9)l{\L {i) (I) - L{I)\ + \L p(lN) (I N ) - L(I N )\ + - > 7} 

< (AT - 0)l{2sup|L p(I) (I) - L(I)\ + X. (/) (I) - L f(jm) (lW) > 7}. 
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By definition of I, we have L~^(I) — L^ I (n^(I^) < 0, so that 

\6° R -0\l Him < (N-0)l{ S up\L m (I)-L(I)\>l} (24) 
^ lex z 

< (N-6)^l{\L m (I)-L(I)\>^}. 

lex 



Since V/ £ I, we both have L p (I) — L(I) and p(I)/n — '— > loo(I) £ [0, 1) as well as the 

n— >oo n— »oo 

fact that Rp(I) (given by (8)) is a continuous function ofp/n, we obtain L^i^il) a ' s ' > L(I). 
Therefore, 

l{|4 (r) (/)-L(J)|>|}-^->0. 

Indeed, if X n — ^> X then Ve > 0, we have l{|X n — X| > e} 0. It thus follows that 
(8% R - 0)lfy im ^ 0. We finally get that 9% R ^ 9. 
We now turn to point ii). We may write as previously, 



_ r i n 



{A=A,A=iu} 



+ y/n(9 n - Q)l{fp(N)y 

For each partition / = (X, A, /i) < by applying the central limit theorem, we get that 

i n i 

£ i«m - 4 d=> - <>))■ 

i=l 

Hence, using again that card(X) is finite, 

1 n 

£ ^fe^A) E " °] = o,(i)- (25) 

I=(N,\,ij,)<I( n ) VP ; i=l 

We shall now prove that JniQ^ 11 — 9)lfj, T(N) — - — > 0. In fact, according to (24), for all 



e > 0, we have 

F(yfr\§° R -0\l !iiim >e) < P(J^JW) 



< ¥(sup\L m (I)-L(I)\>l) 

lex & 

< En\h(i)(i)-L(i)\>l)-—>o 



lex 



where 7 is defined by (23). Therefore, y/n(8~: R — 9)1^ I(N) = op(l). We finally conclude 
that y/n{6° R -e) = Op(l). 

We now prove the last statement Hi) of the proposition. We have 

Var^™) <E[(Vn-(§C R - 9)) 2 }, 
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where 



E E 

I=(N,X,tJ.)<I (N) 



1 'J2(^HXie[x,fA}-e)) 2 i 



in \ <■ — ' 1 ii — A 

i=i 



{X=X,jl=^} 



E[(Vn-(eZ R -e)fi{iiiiW}]. 



The first term of the above equation is bounded as in the proof of Proposition 2 (see 
inequalities (20) and (21)) 



{A=A,A=/4 



I=(N,\,fi)<l( N ) 



< 



E 

I=(N,\,n)<l( N l 



1 



40 



nV( M -A) 3 (yu - A) 2 /i-A 



n \(/i — A) 



The second term is bounded by 

E[(^n(9% R - 6)) 2 l{i £ I (Ar) }] < (iV-0) 2 nP(I^ J^) 



7* 



< (iV - #) 2 nP(sup|L p (I) - L(I)| > 

/ex 2 



< (iV-0) 2 n^P(|L p (/)-L(/)|> 
/ex 



For each partition I El, according to the calculations in the proof of Lemma 1, we have 

2n — p k n k n(n — p + 1) \ 1 f n k \2 



L„(7) - L(J) 



E 



E 



(n - l){n-p) ^ n\I k \ (n - l)(n-p) \I k \ n 



+ 321 



2n — p ( sr~^ 1 /Hfc 



(n - l)(n-p) I ^ |4| V n 
n{n — v— v 1 /nfc \ 2 



E 



(re - l)(n-p) ^ \I k \ \ n 

2n(n-p + 1) v - a k fn k 
(n - l)(n-p) Z-j |4| \ n 



Oik)- 



This leads to 



n\L p {i)-L{i)\>±) < 



+ 



El / n k 



~~ 6(2n — p) 



'(E 
+ P (|E 



J_fl^_ \ 2 > (w- l)(n-p)7 
|4^| \ n / 6n(n — p+1) 



|4| V n 



oik 



> 



(n — l)(n — p)7 
12n(n-p + 1) 4 
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According to Hoeffding's inequality, we have 



(n- l)(n-p)7 

> — : S21 - Sll 



6(2n — p) 



n(n — l)(n — ri)l 
> V -.- - — — - n \s 21 - sn 



i=i fc 
< 2exp[-2n(^ 



1 \- 2 /(n-l)(n-p)7 ^ 



6(2n — p) 



6(2n — p) 

|S21 - sn|) 



as well as 

\\^\I k \\n 

and 



> (n - 1)(n - phN )<2exp 
~~ 12n(n-p + 1) / ~ 



— 2ns 



ii 



/ (to - l)(n -p)7 \ 2 
V 12n(n-p + 1) / . 



1 



'(l?iib( 



> 



(n — 



l)(n-p)7 \ 



6n(n — p + 1) / 



s E 1 



to 
n 



2 > 14 1 (™ - l)(ro-p)7 



6Dn(n — p + 1) 

> 



^ E p (|E( 1 ^ e/ ^- a 

fc i=i 

^_ 2 ^|4|(«-I)(n-P)7 



: - ! |4|n(n - l)(n -p)7 



< 2exp 



6L>(n -p + 1) 
Hence, we obtain that nP(|L p (J) - L(J)| > 2) 
limsupVar(^/ro6^ R ) < +oo. 



n—t+oc 



6D(n-p+l) 



> 0. Finally, we conclude that 
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-> 0, while we further 



A Appendix. Proofs of technical lemmas 
A.l Proof of Lemma 1 

Note that Celisse and Robin (2010) prove that E[\\g - - 
establish that it is 0(l/n). By a simple bias- variance decomposition, we may write 

E[\\g I -g I \\ 2 2 ]=E[\\g-g I \\l]-\\g I -g\\l 

As for the bias term, it is easy to show that 



\\g-9i\\l 



M\\g-h\\l 



inf 

(afe)fc£ 



2-2 / ( s ^a k l Ik {x))g(x)dx + / ( ^ a k lj k (x)) dx 



{a.k)k 



inf cro [\\g\ \l ~ 2 a ^ + Y. a k\ Ik \ 



2 - s 21- 



(26) 



Let us now calculate the mean squared error of gi 

■ i 



nWg-giWi] 



2 + E 



2 + E 



2 + E 



II5/II2 - 2 / gi(x)g{x)dx 







[jC(E^(«))"*-^'e^^ 



[E 



//T 



2 E 



,, n 2 \I k \ ~ ^ n\I k \ 



Since n k follows a Binomial distribution B(n,a k ), we have 

E[rifc] = nafc and E[n|] = n 2 a| + nafc(l — «&). 

Therefore, 



n\\g-gi\\l] 



2 t \ - n 2 a\ + na k {\ - a k ) ^ 
2 + 2^ n 2| 7 | 2 2^ 

fc 1 ftl fe 

2 - «21 + -(Sll - S 2 l). 

n 



(27) 



Using (26) and (27), we obtain the desired result, namely 



E[|| flJ - = n\g - m\\l] -\\91- g\\l 



n 



(s U -s 21 ) = 0(-). 



n 
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A. 2 Proof of Lemma 2 

f) Since 



lim — < 1 and — °" s ' > a k , for all k, 

n— ¥oon n n— >oa 



we obtain that 



f (T) - II ||2 , 2re -p V- »fc _ n(re-p + l) \ - 1 ,n k . 2 
P (n-l)(n-p)^n\I k \ (n - l)(n - p) ^ \I k \ [ n ' 



a.s. I, mo 

y II, ,11- 



2-E^ = ll5lli- s 2i = lk-5||I = ^(I)- 



2 



ii) By definition of and using (27), we have 

R(I) = E[\\g - ff/Hl] - = -S2i + -(*U ~ »2i). 

n 

This gives that 

r\b tr\ t>( tw r\ 2n ~P n fc n(n-p+l) 1 ,n k , 2 

Vn[R p (I) - R(I)] - V n[ {n _ i){n _ p) ^—- {n _ i){n _ p) ^—(-) 

+S21 ~ -(Sll - «2l)j 

n J 
2n-p ^ 1 r nj n k (2n-p)^/n 



(n - l)(n -p) ^ |Ife| v n (n-l)(n-p) 



n(n — p + 1) 1 r ,— ,n k Nn 2 (2n — p)*Jn 

... ,i S21 



V^n - l)(n - p) ^ |/fc| v re " - (n-l)(n-p) 



2n(n-p+l) ^ a fc r 1 , 



(n - l)(n -p) ^ |4| v n y 7 ™ 

(n- l)(n-p) ^ |/ fe | LV v n Ji y ' 

Then, using the central limit theorem and the continuity of the function ihi 2 , we have 

Vn(— - a k ) d > 7V(0, a k (l - a k )), 

n n— >oo 

\Vn(— - a k )] 2 — — > Z\ with Z k ~AA(0,a fc (l - a*.)). 

L n n— >oo 

It thus follows that T± = op(l). We now consider the remaining term in (28). We have 

141 



E«t r i-t n k w 1 «fc 

|4|[v^(--«,)] = ^L^-V^L 

A; ft ft 

1 ™ 

/^E|fi(EW>-v^2i 

k i=l 
1 " 

^S ( ?S lMfc " s2i) - 
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Let us denote 

fc_S21 ' 

Then the random variables Y\ , Y2, . . . , Y n are iid centered with variance 

2 

a] = E(y x 2 ) = E ( E ]J^ lx ^ ~ 2521 E ^ lxie/ ^ + S 2i) = 532 " 
By the central limit theorem, we obtain 

Oik r i-i n k \ 1 d 

it 

Combining this with (28) implies that 



Ern[v / «(--« fe )]^Ar(o,af). 



fa[Rp{I) - R(I)] AA(0, 4a 2 ). 

n— >oo 

It is easy to calculate that 



Hence, we have 

Vn[L P (I) ~ L(I)} ^-^ Af(0, 4a 2 ), 

ri— >oo 

which completes the proof. 
A. 3 Proof of Lemma 3 

i) If / is a subdivision of I^ N \ then / = (iV, A,/i) with [X,fj] C [A*,/x*]. For example, we 
may have the following situation 

A* Aat /i TV jUr* 1 



H 1 1 h- • h 



A* Ajy A /i fj,x [j,* 1 

1 1 1 1— • 1 1 1 1 »H 1— 

Since 5 is constant on the interval [A*,/i*] D [Xn,/j-n] 3 [A,/x], we have gj = g^N) = g 
on the interval [Ajv, /ijv]. This implies that ||<7j — £/ 1 1 2 = IIS/W ~~ fflll- 
ii) If I = (2 m , A, fj) is not a subdivision of I^ N \ then there are two cases to consider: 
If m = m max then [A, fj] ^ [Ajv, ^n}- For example, we may have 

A* A/v Mat fi* 1 

1 1 1 1 — • 1 1 »H 1 — J 

A A* A/v n fir? ft* 1 

1 1 1 1— • • 1 1 »H 1— I 
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Since gi = g^N) = g on the interval [\n,Hn] and the two partitions I and I^ N > restricted 
to the interval [A, /j] c n [A at, Mat] are the same, we thus have 

IIS/ - 3ll2,[A,/x]= = HS/W -5ll2,[A,/x]°' 

so that 

1 1ST -9\\2 - IIS/W - fill! = HffJ-Sll^A.M] ~ Hf/ (JV) ~ fll2,[A, /t ]- 

Using the monotonicity of / on the intervals [0, A*] and [fj,*, 1], we get that 

IIS/ -3ll2,[A )M ] > Ib/W -fll2,[A lM ]' which implies that L(I) > L(/ (Ar) ). 
If m < m m ax, we may have for example 



A* Aat fiN (j,* 1 

H 1 1 1— • 1 1 »H 1— 

A A* A^ /x /ijv /x* 1 

H 1 • • 1 1 • I— 



I 



As before, we may show that 

11 ii2 11 1 1 2 11 1 1 2 11 1 1 2 

115/ - 9 H2 _ \ \9l(N) ~ g\\2 > \\9l ~ &ll2,[A,/i] c ~ IIS/W - 5ll2,[A,/i] e > °> 

which completes the proof. 

We remark that the assumptions in Lemma 2.1 or Theorem 2.1 in Celisse and Robin 
(2010) are not sufficient to show these results. In fact, the assumption "g is non-constant 
outside A*" is not sufficient to imply that \\g — < \\g — Sf||| in the case where I is 

not a subdivision of . For example, let us consider the following situation 

a c b A* Aa? /xat 1 

1 1 1 1 1— • 1 1 »H 1— 1 

a A b A* Aat a* at M* 1 

1 1 1 1 — • 1 1 1 1 «-H 1 — I 

We may then calculate that 

lis - S7II2 - IIS - 9im Wl = ( c - «)(«i - «) 2 + ( b ~ c)(a 2 - a) 2 , 

where 

1 rb 1 rc 1 f b 

a = / g(x)dx, ol\ = / g(x)dx, a 2 = - / g(x)dx. 

b- a J a c-a J a b-c J c 

So that if the function g satisfies a = a± = a 2 (and g is non-constant outside A*) then 
\\9 ~ 9i(N)\\l = llff-S/lli- 



